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© Method for time-scale modification ("TSM") of a signal, for example, a voice signal, wherein starting 
positions of blocks in an input signal, referred to as analysis windows, are varied and an output signal is 
reconstructed by overlapping analysis windows using fixed window offsets, i.e., the duration of overlap between 
analysis windows is fixed during reconstruction. This is done by searching for segments of the input signal which 
are similar to the previous portion of the output signal. In one embodiment of the present invention a cross- 
correlation is used as a similarity measure to evaluate such similarity and the cross-correlation uses a fixed, 
predetermined minimum number of samples. The starting position of the analysis window which results in the 
greatest similarity in overlapping regions is determined as the starting position which provides the largest value 
ol cross-correlation in the overlapping regions. Several cross-correlations are evaluated by shifting the analysis 
window over a predetermined number of samples, removing the first shifted samples in the evaluation each time, 
and using the same, predetermined number of samples in the evaluation to determine the "best" starting 
position for an analysis window. Finally, the predetermined number of samples from the beginning of the 
analysis window are averaged with the predetermined number of samples from the end of the previous portion of 
the output signal and the remaining samples in the window are appended to the averaged segment of the 
previous portion of the output signal. 
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Technical Field of the Invention 

The present invention relates to a method for time-scale modification ("TSM"), i.e., changing the rate of 
reproduction, of a signal and, in particular, to a method for time-scale modification of a sampled signal by 
5 time-domain processing of the sampled signal to provide reproduction of the signal at a wide variety of 
playback rates without an accompanying change in local periodicity. 

Background of the Invention 

io A need exists in the art for a method for time-scale modification of acoustic signals such as speech or 

music and, in particular, a need exists for such a method which will provide time-scale modification without 
modifying the pitch or local period of the time-scale modified signals. Thus, a need exists for a method for 
changing the perceived rate of articulation while ensuring that the local pitch period of the resulting signal 
remains unchanged, i.e., there are no "Alvin the Chipmunk" effects, and that no audible splicing, 

75 reverberation, or other artifacts are introduced. 

Specifically, time-scale modification ("TSM") of a signal by time-scale compression, i.e., a method for 
speeding-up a playback rate of the signal, or by time-scale expansion, i.e., a method for slowing-down the 
playback rate of the signal, is needed to match the time-scale of the signal with a predetermined duration. 
For example, TSM can be used: (a) by a radio station to speed up dance music; (b) by a blind person to 

2Q speed up a recorded lecture; (c) by a student of a foreign language to slow down instructional material; (d) 
by an editor to synchronize a dubbed sound track with a video signal and to compress them into convenient 
time slots; (e) by a secretary to slow down or speed up a dictation tape for transcription; (f) by a voicemail 
system to provide a message to a listener at a faster or slower rate than that at which the message was 
recorded; and so forth. 

25 When a segment of an input signal is compressed to speed-up the signal, the informational content of 

the compressed signal is reduced relative to that contained in the input signal to produce an output 
segment of shorter duration. Ideally, compression should delete an integer multiple of local pitch periods 
and these deletions should be distributed evenly throughout the input segment. Further, to preserve 
intelligibility, no phoneme should be removed completely. 

30 When a segment of an input signal is expanded to slow-down the signal, the information content of the 
expanded signal is increased relative to that contained in the input signal to produce an output segment of 
longer duration. Ideally, expansion should insert additional pitch periods which are distributed evenly 
throughout the input segment. This proves to be difficult in practice, however, since the local pitch period 
varies across phonemes and may be difficult to gauge during nonperiodic portions of a speech signal such 

35 as fricatives. 

Several methods have been developed in the prior art to provide TSM. Previously, TSM was 
accomplished using three basic methods: frequency domain processing methods, analysis/synthesis meth- 
ods, and time-domain processing methods. However, all of these prior art methods have drawbacks. For 
example, an article entitled "Signal Estimation from Modified Short-Time Fourier Transform" by D. W. 

40 Griffin and J. S. Lim in IEEE Transactions on ASSP, Vol. ASSP-32, No. 2, April, 1984, pp. 236-243, 
introduced a frequency-domain processing method which iteratively synthesizes an output signal having a 
spectrogram which is a compressed or expanded version of a spectrogram of an input signal. Although the 
disclosed method works well on almost any acoustic material, it has a drawback in that it requires a large 
amount of computation. As a result, even though this prior art frequency domain processing method is 

45 robust, it is so computationally intensive that it cannot be utilized in many real-time applications. 

Analysis/synthesis methods operate by reducing an input speech signal into a set of time varying 
parameters which can be time-scaled, this being referred to as analysis, and by utilizing the time varying 
parameters to construct a time-scale modified signal, this being referred to as synthesis. For example, a 
method suggested by T. F. Quatrieri and R. J. McAulay in an article entitled "Speech Transformations 

50 Based on a Sinusoidal Representation," IEEE Transactions on ASSP , Vol. ASSP-34, December, 1986, pp. 
1449-1464 utilizes a limited number of sinusoids to model a speech signal. Then, in accordance with the 
disclosed method, the time-scale of the input signal is modified by varying the rate at which the sequence 
of sinusoids is played back. Although such analysis/synthesis methods require less computation than 
frequency domain processing methods, they have a drawback in that they are restricted to signals which 

55 can be represented by a limited number of time-varying parameters. As a result, analysis/synthesis 
methods generally perform poorly on more complex signals, such as speech signals which are corrupted by 
noise or which contain music. 

Time-domain methods operate by inserting or deleting segments of a speech signal. One of the original 

3 
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time-domain methods of TSM was proposed in the 1940s and entailed splicing, i.e., abutting, different 
regions of a signal at a fixed rate to compress or expand tape recordings. This method results in 
discontinuities in transitions between inserted or deleted segments and such discontinuities lead to 
bothersome clicks and pops in the resulting tirne-scale modified signal. 

5 Several attempts have been made in the art to minimize the effects of inter-segment transitions in a 

time-scale modified signal by improving the splicing method or by windowing adjacent segments. In 
general, these methods improve quality at the expense of increasing complexity. One such method of time- 
domain TSM, i.e., "Time-Domain Harmonic Scaling" ("TDHS"), is disclosed in an article entitled "Time- 
Domain Algorithms for Harmonic Bandwidth Reduction and Time Scaling of Speech Signals" by D. Malah, 

w IEEE Transactions on ASSP, Vol. ASSP-27, April, 1979, pp. 121-133. This article discloses a TDHS 
algorithm which improves on the original method of splicing by synchronizing splice points to a local pitch 
period and by using overlap-add techniques to fade smoothly between the splices. In particular, the TDHS 
algorithm operates by determining the location of each pitch period in the input signal to be modified and 
then by segmenting the signal around these pitch periods to achieve the desired modification. In 

75 accordance with this TDHS method, an integer number of pitch periods has to be inserted or deleted and it 
is necessary to maintain a record of the modifications to insure that an appropriate number thereof took 
place. The TDHS method provides good quality in the class of low complexity time-domain methods. 

An alternative to the TDHS method is disclosed in an article entitled "High Quality Time-Scale 
Modification for Speech" by S. Roucos and A. M. Wilgus, Proceedings ICASSP 86, Tokyo , March, 1985, 

20 pp. 493-496. This article discloses a Synchronized Overlap-Add ("SOLA") time-domain processing method 
which has low complexity and which operates without regard to pitch periods in a speech signal. In 
accordance with the SOLA method, an input signal is sampled and the samples are segmented at a fixed 
analysis rate into frames, referred to as windows, and the windows are shifted in time to maintain a 
predetermined average time-compression or expansion. The windows are then overlap-added at a dynamic 

25 synthesis rate to provide an output. In accordance with this method, the input signal is windowed using a 
fixed, inter-frame shift interval and the output signal is reconstructed using dynamic, inter-frame shift 
intervals. The inter-frame shift interval used during reconstruction is allowed to vary so that a shift which 
maximizes the cross-correlation of a current window with previous windows is used. Hence, this method 
results in a region of overlap which is dynamic between windows and which requires evaluation of a cross- 

30 correlation with a variable number of points. As a result, this method allows one to change the relative 
overlap between windows which, in turn, modifies the time-scale of the input signal without significantly 
affecting the periods in the signal. 

The SOLA method may be understood in light of the following description which should be read in 
conjunction with FIG. 1. First, with reference to FIG. 1, there are four parameters which are used in the 

35 SOLA method: (a) window length W is the duration of windowed segments of the input signal —this 
parameter is the same for the input and output buffers and represents the smallest unit of the input signal, 
for example, speech, that is manipulated by the method; (b) analysis shift S a is the interframe interval 
between successive windows along the input signal; (c) synthesis shift S s is the interframe interval between 
successive windows along the unshifted output signal; and (d) shift search interval K maj< is the duration of 

40 the interval over which a window may be shifted for purposes of aligning it with previous windows. 

The SOLA method modifies the time-scale of an input signal in two steps which are referred to as 
analysis and synthesis, respectively. The analysis step comprises cutting up the input signal, x[n] — n is a 
sample index and x[n] is the value of the n th sample— into possibly overlapping windows — x m [n] is the n !h 
sample of the m !h input window. Each input window has a fixed length W and is separated by a fixed 

45 analysis distance S a . In accordance with the SOLA method: 



x[mS a + n] for n = =0, . . . . , W - l 
0 



55 otherwise 

The synthesis step comprises overlap-adding the windows from the analysis step every S s samples. 
Each new window is aligned with the sum of previous windows before being added to reduce discontinuities 
in the resulting signal which arise from the different interframe intervals which are used during analysis and 
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synthesis, i.e., the windows are overlapped and recombined with the separation between them compressed 
or expanded so that, on average, windows are separated by a new synthesis distance S s . The ratio a = S s / 
S a gives the desired compression or expansion rate where a > 1 corresponds to expansion and a < 1 
corresponds to compression. The approximate duration of the modified signal is given by "a * (duration of 
5 Ihe inpul signal)." 

The synthesis shift which is actually used for the m th window x m [n], i.e., x m [n] = x[mS a + n] for n = 0, 
.... W-1, is adjusted by an amount k m which is less than or equal to K max in order to maximize a similarity 
measure of data in the overlapping regions before the overlap-add step is carried out. As a result, in 
accordance with the SOLA method, the output y[i], where i is a sample index and y[i] is the value of the i ,h 
w sample, is formed recursively by: 

(2) y[mS s + k m + n] <-- b m [n]y[mS s + k m + n] + (1 - b m [n])x m [n] 
for n = 0 W m ov - 1 

is and 

(3) y[mS s + k tn + n] < - x m [n] 
for n = W m ov , , W - 1 

20 where: VTqv is the number of overlap points for the m th window and W m ov = k m .i - k m + W - S s . Further, 
shift k m is selected to maximize a similarity measure, for example, the cross-correlation or average 
magnitude difference, in the overlap region between the current output y and the m ,h window x m . Still 
further, b m [n] is a fading factor between 0 and 1 , for example, an averaging or a linear fade, which is chosen 
to minimize audible splicing artifacts. 

25 The SOLA method has a drawback in that the amount of overlap for the m lh window, W m ov . between the 

output and the m ,h analysis window varies with k m and this complicates the work required to compute the 
similarity measure and to fade across the overlap region. Also, depending on the shifts k m , more than two 
windows may overlap in certain regions and this further complicates the fading computation. 

As a result, there is a need in the art for a method for modifying the time-scale of speech, music, or 

30 other acoustic material without modifying the pitch, which is robust, and which does not require excessive 
amounts of computation. 

Summary of the Invention 

35 Embodiments of the present invention advantageously satisfy the above-identified need in the art and 

provide a method for modifying the time-scale of speech, music, or other acoustic material over a wide 
range of compression and expansion without modifying the pitch. 

The inventive method is an improvement on the SOLA method described in the Background of the 
Invention and is referred to here as a Synchronized Overlap-Add, Fixed Synthesis time domain processing 

40 method ("SOLAFS"). In general, the inventive method comprises superimposing partially overlapping blocks 
of signal samples from an input signal in a manner which aligns similar signal blocks from different locations 
in the input signal. Further, in accordance with a preferred embodiment of the present invention, if the 
distance between similar blocks of the input signal to be superimposed is greater than the distance between 
superimposition regions, the rate of reproduction will be increased, i.e., time-scale will be compressed. 

45 Correspondingly, if the distance between similar blocks of the input signal to be superimposed is less than 
the distance between superimpositions, the rate of reproduction will be decreased, i.e., time-scale will be 
expanded. 

In accordance with the present invention, blocks of the input signal, referred to as analysis windows, are 
taken at an average rate of S a with each starting position allowed to vary within limits and an output signal is 

so reconstructed using a fixed inter-block offset S s , i.e., the duration of overlap with the existing signal in each 
window to be added is fixed. This is done by searching for segments of the input signal near the target 
starting position mS a which are similar to the portion of the output signal that will overlap when constructing 
the output signal. A similarity measure is used to evaluate such similarity and, in accordance with the 
present invention, the similarity measure uses a fixed, predetermined minimum number of samples. The 

55 fact that the region of overlap is fixed is advantageous because the number of computations which are 
required to evaluate the similarity measure over the range of shift values are reduced over that required in 
the prior art SOLA method. Several similarity measures are evaluated by shirting the starting point of an 
analysis window over a predetermined number of samples, i.e., removing samples from the beginning of the 

5 
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analysis window as new samples from the input are appended to the tail of the analysis window, thus using 
the same, predetermined number of samples in the evaluation. The starting position of the analysis window 
which provides the maximum similarity in the region of the analysis window which will overlap with the 
region or the output signal is selected from all starling positions tested. Finally, the predetermined number 

5 of samples in the region of overlap are combined with the predetermined number of samples from the end 
of the previous portion of the output signal and the remaining samples in the window are appended to the 
combined segment of the previous portion of the output signal. 

An important attribute of the SOLAFS method is that the starting position which provides the maximum 
similarity over the range of possible starting positions for a given input block can often be determined 

w without evaluating the similarity measure for all possible starting positions. This method of determining the 
"best" shift without evaluating all possible shifts is referred to as "prediction." "Prediction" occurs when the 
fixed region of the output signal which is used in the similarity measure evaluation is also contained in the 
range of possible starling positions for the next input block. Whenever this occurs, one can "predict" with 
certainty that a shift which overlaps these identical regions will maximize the similarity measure. Although 

75 "prediction" is not possible for all cases, for moderate changes in the time-scale or for processing in which 
small inter-block intervals are used, "prediction" is possible quite often. As one can readily appreciate, 
"prediction" is highly advantageous because it obviates the need to merge the overlapping regions since 
they are identical. As a result, only data points beyond the region of overlap from the new input block need 
to be appended to the output to extend the signal. 

20 Since the inventive method uses fixed segment lengths which are independent of local pitch, the 

inventive SOLAFS method advantageously operates equally well on speech or non-speech signals. Further, 
since the inventive method aligns only a fraction of an analysis window to the time-scaled signal, the 
inventive SOLAFS method advantageously is more efficient than the SOLA method and provides greater 
flexibility in choice of parameters. Still further, since the inventive method maintains the extent of 

25 superimposition constant throughout each frame and fixes it over the range of reproduction rates, the 
inventive SOLAFS method advantageously simplifies the computation required when compared to the 
computation required to carry out the SOLA method. As a result, the inventive SOLAFS method advanta- 
geously provides a robust time-scale modification ("TSM") signal using substantially less computation than 
SOLA or TDHS and the TSM signal is unaffected by the presence of white noise in the input signal. Further, 

30 using a relatively small amount of trial and error, one can determine parameters for use in embodying the 
inventive method so that the resultant time-scale modified speech contains few audible artifacts and 
preserves speaker identity. 

Brief Description of the Drawing 

35 

A complete understanding of the present invention may be gained by considering the following detailed 
description in conjunction with the accompanying drawing, in which: 

FIG. 1 shows, in pictorial form, the manner in which the prior art SOLA method operates to provide time- 
scale compression for an input signal; 
40 FIG. 2 shows, in pictorial form, the manner in which an embodiment of the inventive method operates to 
provide time-scale compression for an input signal; 

FIG. 3 shows, in pictorial form, the manner in which an embodiment of the inventive method operates to 
provide time-scale expansion for an input signal; 

FIG. 4 shows a detailed analysis of the manner in which an embodiment of the inventive SOLAFS 
45 method operates; 

FIGs. 5-7 show a flowchart of the inventive SOLAFS method; and 

FIG. 8 shows, in pictorial form, the manner in which an embodiment of the present invention operates to 
provide time-scale modification utilizing "prediction." 

so Detailed Description 



The present invention relates to a method for time-scale modification ("TSM"), i.e., changing the rate of 
reproduction, of a signal and, in particular, to a method for time-scale modification of a sampled signal by 
time-domain processing the sampled signal to provide reproduction of the signal at a wide variety of rates 
55 without an accompanying change in pitch. An input to the inventive method is a stream of digital samples 
which represent samples of a signal. There exist many apparatus which are well known to those of ordinary 
skill in the art for receiving an input signal such as a voice signal and for providing digital samples thereof. 
For example, it is well known to those of ordinary skill in the art that commercially available equipment 
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exists for receiving an input analog signal and for sampling the signal at a rate which is at least the Nyquist 
rate to provide a stream of digital signals which may be converted back into an analog signal without loss of 
fidelity. The inventive method accepts, as input, the stream of digital samples and produces, as output, a 
stream of digital samples which are representative of a TSM signal. The TSM digital output is then 
5 converted back into an analog signal using methods and apparatus which are well known to those of 
ordinary skill in the art. 

The inventive method is an improvement of the prior SOLA method discussed in the Background of the 
Invention, which inventive method is referred to as the Synchronized Overlap-Add, Fixed Synthesis method 
("SOLAFS"). With reference to FIGs. 1 and 2, there are four parameters which are used in the inventive 

io SOLAFS method: (a) window length W is the duration of windowed segments of the input signal --this 
parameter is the same for input and output buffers and represents the smallest unit of the input signal, for 
example, speech, that is manipulated by the method; (b) analysis shift S a is the interframe interval between 
successive search ranges for analysis windows along the input signal; (c) synthesis shift S s is the interframe 
interval between successive analysis windows along the output signal; and (d) shift search interval K max is 

?5 the duration of the interval over which an analysis window may be shifted for purposes of aligning it with the 
region of the output signal it will overlap. 

In essence, the first W ov samples in each new window in the input signal, referred to as an analysis 
window, are overlap-added with the last VV ov samples in the output signal, i.e., this is referred to as overlap- 
adding at a fixed synthesis rate. In accordance with the inventive method, the starting point of each analysis 

20 window is varied by: (a) evaluating a similarity measure such as, for example, the cross-correlation, of the 
first W 0 v points in the analysis window with the last W ov points in the output signal, where W ov is a 
predetermined, fixed number; (b) then the starting point of the analysis window is shifted by a fixed amount 
and a new cross-correlation of the first W 0 v points in the new analysis window with the same last W 0 v 
points in the output signal is evaluated; (c) step (b) is performed a predetermined number of times, 

25 K max ,and the new analysis window is chosen to be the one wherein the cross-correlation is maximized. 
Finally, the first W ov samples in the new analysis window are overlap-added with the last W Q v samples in 
the output signal and S s additional points from the analysis window are appended to the output signal. The 
term overlap-added refers to a method of combination such as averaging points or performing a weighted 
average in accordance with a predetermined weighting function. 

30 In the following x[i] represents the i ,h sample in the input digital stream representative of an input signal. 

In accordance with the inventive method, analysis windows are chosen as follows: 



x m [mS a + k m + n] for n = 0, , W- 



0 

40 

otherwise 

where: m is a window index, i.e., it refers to the m lh window; n is a sample index in an input buffer for the 
input signal, which buffer is W samples long; k m is the number of samples of shift for the m ,h window; and 
45 x m [n] represents the n ,h sample in the m ,h analysis window. 

The analysis windows are then used to form the output signal y[i] recursively in accordance with the 
following: 

(5) y[mS s + n] <--b[n]ylmS s + n] + (1 - b[n])x m [n] 

50 

for n = 0 W ov - 1 

and 

(6) y[mS s + n] <- x m [n] 

55 

for n = W ov W - 1 

where: W Q v = W - S s is the number of points in the overlap region and b[n] is an overlap-add weighting 
function which is referred to as a fading factor -an averaging function, a linear fade function, and so forth. 

7 
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(4) x m [n] = 
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w 



25 



Note that, in accordance with the present invention, shift k m affects the starting position of an analysis 
window in the input digital stream. For a particular window, an optimal shift is determined by maximizing a 
similarity measure between the overlapping samples in x m and y. A similarity measure which works well in 
practice is the normalized cross-correlation between x and y in the overlap region: 



(6) km <~ max R^k] 
OSk< K max 



where K max is the maximum allowable shift from the initial starting position of the analysis window, and 
(7) FP xy [k] = r'Vk] / (r-Kxtk] * r m yy [k]) 1/2 
where: 

Wov-i 

(8) r^tk] = SUM x[mS a + k + n]y[mS s + n] 

n=0 

20 W OV-1 

(9) r^tk] = SUM x 2 [mS a + k + n] 

n=0 
w, 



OV-1 

(10) r m vv - SUM y 2 [mS s + n] 



n=0 



30 Other similarity measures such as the average magnitude difference could also be utilized: 

W OV-1 

(11) R m avmag [k] = SUM jy[mS s + n] - x[mS a + k + n] | 



However, this particular measure is not optimal since it is sensitive to signal amplitude. 

Finally, note that overlap regions occur in the output with a predictable rate, S s , and have a fixed length, 
AO W 0 v. This can be seen in FIG. 2 which shows a TSM compressed signal and FIG. 3 which shows a TSM 
expanded signal. Therefore, a fixed-length fading function b[n] can be used, and its values can be 
precomputed and stored in a lookup table. 

The following provides an explanation of how the inventive SOLAFS method operates in detail in 
conjunction with FIG. 4. Referring to FIG. 4, the samples in the digital input stream 100 are labeled 1, 2, 3, 
45 and so forth. Although the relative heights of the arrows could be used to indicate the amplitude of a sample 
at a particular point in time, for purposes of the following description, the heights of the arrows have no 
particular significance. 

First, we will consider a TSM compressed signal. In such a case S s < W < S a . For purposes of 
understanding the manner in which the inventive method operates, let S a = 5, W = 4, S s = 2, and Wqv = 

so W - S s = 2. As an initialization step, take W samples from the input signal, which samples are stored in an 
input signal buffer, and place them in an output sample buffer for the output signal. This is shown as line 
101 in FIG. 4. Next, find the start of the first analysis window. The first analysis window starts at sample 5, 
mS a where m = 1 . Note that in accordance with the inventive method we are skipping over sample 4 at the 
end of the previous analysis window. Next, we will find the maximum similarity between the first W Q v 

55 samples, i.e., 2 samples in this case, at the start of the analysis window and the end of the output signal. 
Referring to line 102 of FIG. 4, we compute the cross-correlation between samples 5 and 6 from the start of 
the analysis window and samples 2 and 3 from the end of the output window. Next, we shift the start of the 
analysis window by one and repeat the process. This is indicated as line 103 in FIG. 4 where we compute 
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the cross-correlation between samples 6 and 7 from the new start of the analysis window and samples 2 
and 3 from the end of the output window. This process is continued until we have shifted the analysis 
window by a maximum amount K max which is allowed. Then, we determine which shift corresponds to the 
maximum cross-correlation. Assume thai the maximum cross-correlation occurs when we shift by one 

5 sample. In thai case, we shift the starling position of the analysis window by one sample from the start of 
the search range in the input buffer, i.e., sample 6 rather than sample 5, overlap-add the last W Q v samples 
of the output signal and the first W D v samples (6 and 7) from the start of the analysis window, and transfer 
W - W ov = 2 further samples into the output buffer. This is shown in line 104. Now, this process is 
repeated by choosing the next analysis window. The next analysis window starts at sample 10, i.e., mS a = 

10 10 when m = 2. 

Second, we will consider a TSM expanded signal. In such a case W > S s > S a . For purposes of 
understanding the manner in which the inventive method operates, let S a = 2, W = 5, S s =3, and W D v = 
W - S s = 2. As an initialization step, take W samples from the input signal and place them in the output 
buffer. This is shown as line 201 in FIG. 4. Next, find the start of the first analysis window. The first analysis 

75 window starts at sample 2, mS a = 2 when m = 1. Next, we will find the maximum similarity between the 
first W 0 v samples, i.e., 2 samples in this case, at the start of the analysis window and the end of the output 
signal. Referring to line 202 of FIG. 4, we compute the cross-correlation between samples 2 and 3 from the 
start of the analysis window and samples 3 and 4 from the end of the output window. Next, we shift the start 
of the analysis window by one and repeat the process. This is indicated as line 203 in FIG. 4 where we 

20 compute the cross-correlation between samples 3 and 4 from the new start of the analysis window and 
samples 3 and 4 from the end of the output window. This process is continued until we have shifted the 
signal by the maximum amount K mOK which is allowed. Then, we determine which shift corresponds to the 
maximum cross-correlation. Assume that the maximum cross-correlation occurs when we shifted by one 
sample. In that case, we shift the starting point of the analysis window one sample from the start of the 

25 search range in the input buffer, i.e.. start at sample 3 rather than sample 2, overlap-add the last W Q v 
samples of the output signal and the first W Q v samples from the start of the analysis window and transfer W 
- W 0 v = 3 further samples into the output buffer. This is shown in line 204. Now, this process is repeated 
by choosing the next analysis window. The next analysis window starts at sample 4, i.e., m8 a = 4 when m 
= 2. 

30 It is interesting to note that despite a superficial similarity, SOLA and SOLAFS function quite differently. 

For example, the prior art SOLA method achieves compression by a factor of two by averaging two pitch 
periods into one. In the same situation, the inventive SOLAFS method splices out every other pitch period 
and uses short transition regions to smooth over the gap. More generally, if the distance S a is greater than 
the distance S s , then, on average, (S a - S s ) samples are deleted between segments. Conversely, if S a is less 

35 than the distance S s , then, on average, (S s - S a ) samples are replicated in adjacent segments. The actual 
shift used between windows is given by (S a + k m ), so that the duration of the deleted or repeated segment 
is (S a + k m - S s ) and (S s - S a -k m ) respectively and varies to provide smooth splices. 

An advantage which occurs in accordance with the present invention occurs as a result of the fact that 
the shift distance k m which maximizes the similarity in the overlap region can often be predicted without 

40 computation of the similarity. This fact can be understood as follows. Assume that no more than two 
windows overlap at any point in the output. Then consider the state of the system just before the m th 
window. 

Eqns. (5) and (6) indicate that the last W ov samples of the output y will be equal to samples in the input 
stream: 

45 

(12) y[mS s + n] = y[(m-l)S 8 + <S S + n) ] 

x[(m - l)S a + + (S s + n)] 

50 = x[mS a + t m +n) ] 

where: t^ = k m ., + S s - S a . 

Also assume that 0 S t™ ^ K max . Then, when the last W Q v samples of the output y[mS s + n] are cross- 
correlated with the first W 0 v samples of possible analysis windows x[mS a + k + n], the maximum must be 
55 at k m = I™. With this offsel, the output and input samples in the overlap region are identical and the 
normalized cross-correlation is 1 . Thus, the m lh shift, k m , should be determined by: 
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(13) 



k m < — 

m 



t m = + (S s - S a ) if 0 < t m < K 



max 



max 



0<k<K max 



R m xy[k] 



30 



35 



40 



45 



50 



55 



15 otherwise 

Furthermore, if the m ,h shift is predictable, then the averaging in eqn. (5) is unnecessary since the 
points overlap-added together are identical. The input can simply be copied into the output stream. In 
effect, shift prediction behaves like a modify-on-demand system, since splicing and overlap-adding will only 
be necessary if the predicted shift t m falls outside the allowable range [0, K max ]. For mild compression or 
20 expansion, with S 5 - S a , most of the shifts will be predictable and only occasional splicing will be necessary 
to modify the time-scale. 

FIG. 8 shows, in pictorial form, the operation of an embodiment of the inventive SOLAFS method for a 
case of moderate time-scale expansion, i.e., W = 9, S s = 6, S a = 4, K max = 5, where "prediction" may be 
used. As shown in FIG. 8, line 800 displays signal representations for a periodic input signal. Line 801 
25 displays an output signal after the initialization step of the SOLAFS method. As shown in line 801, the last 
W 0 v signal representations of the output signal —labelled as points 6, 7, and 8- are used to obtain a 
similarity measure for determining the starting position of the first window. Note that the axes for lines 800- 
804 have been aligned in FIG. 8 in order to better illustrate the relationships among key regions of the input 
and output signals during processing. Line 800 also displays the region of possible starting locations for the 
start of each window to be added to the output signal. J 

As is evident from lines 800 and 801 in FIG. 8, the search interval for the start of window 1 on line 800 
contains the same signal representations that are used in the output signal to evaluate the similarity 
measure, i.e., signal representations in W° _1 ov of line 801. As a result, a shift which aligns such signal 
representations in the overlap region of window 1 with the end of the output signal of line 801 will be 
selected as the shift which maximizes the similarity measure from the range of possible starting positions. 
The shift which accomplishes this result can be calculated using eqn. (13). In this case, t1 = ko + (S s - S a ) 
= 0 + 2 = 2, and ki = 2. Such a shift can be determined without evaluating the similarity measure as 
long as the starting point of W Q v from the output signal is present in the range of possible starting positions 
for the next window. 

Line 802 in FIG. 8 shows the output signal after the addition of window 1 from the input signal. From the 
numbers shown above the signal representations in FIG. 8 one can see that no arithmetical merging was 
required in the overlap region since the points were identical and subsequent data points were merely 
appended to the output signal. Similarly, in line 803, the start of window 2 is selected so as to align regions 
of overlap and the shift which accomplishes this result can be calculated using eqn. (13): fe = ki + (S s - 
S a ) = 2 + 2 = 4, and k 2 =4. 

For window 3, however, the region of output used in the similarity evaluation, W*~ 3 0 v on line 803, is not 
present in the search range of possible starting positions. In jthis case, the shift to align the regions using 
eqn. (13) — ta = k 2 + (S s - S a ) = 4 + 2 = 6- is greater than and is not possible. Thus, the similarity 
measure for all possible shifts must be evaluated to determine the best possible shift. 

On line 804, a shift of 0 is selected as the best shift and the signal representations from window 3 in the 
region of overlap, W 2- ^ from line 803, are no longer identical to the last W Q v signal representations from 
the output signal, line 803, and must be arithmetically merged to extend the output signal as shown on line 
804. At this point, predicting the best shift becomes possible ? since the points in YP~ A Q v in line 804 appear 
in the search range for the start of window 4 in line 800. 

The bulk of the compulation in the inventive SOLAFjS method revolves around computing the 
normalized cross-correlation R m xy[k] and choosing the maximum. This can be simplified in several ways. 
For example, one can avoid the square root in choosing k^ using the following: 
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(14) k m <- max r m xy [k]: r xy [k]: / {^[k] * r^} 
0*k*K max 

or even more simply: 

5 

(15) k m <-- max r m xy [k]: r m xy[k]: / r^k] 
0^ksK max 

Since the value of r m yy is constant over all values of k in the comparisons. 
10 Further simplifications result by computing r^^k] recursively: 

(16) r m xx[k + 1] = r^tk] + x 2 [mS a + k + W] - x 2 [mS a + k] 

Both eqns. (14) and (15) give precisely the same answer as eqn. (6), however, eqn. (15) requires the 
75 least amount of computation since the constant r m yy is not used and, thus, is not computed. 

On the other hand, eqn. (14) is always scaled so that its magnitudes are less than or equal to 1. This 
may be convenient in a fixed-point implementation. Care must be used with fixed-point arithmetic for all 
three approaches to avoid overflow when computing cross-correlations rxy, r w , and ryy. 

The inventive SOLAFS method requires a W ov length output buffer to hold the last samples of the 

20 output, i.e.. y[mS s ] y[mS fl + W 0 v - 1], and a W + K max length input buffer to hold the input samples 

that might be used in the next analysis window, x[mS a ], ... , x[mS e + W + K max -1]. One must take note of 
the fact that in a real-time application, time-scale compression will require reading in input data at a much 
faster rate than usual. This may cause difficulties if the data is stored in compressed form and must be 
decoded, or if the storage unit is slow. 
25 FIGs. 5-7 show a flowchart of one embodiment of the inventive SOLAFS method. The following is 

nomenclature which is used in the following flowchart: (a) W is the window length and represents the 
smallest block or unit of a signal that is manipulated by the inventive method; (b) S a is the analysis shift and 
represents the interframe interval between successive search intervals along the input signal; (c) S s is the 
synthesis shift and represents the interframe interval between successive windows in the output signal; (d) 
30 k m is the window shift and represents the number of data samples the m !h analysis window is shifted from 
its target position, mS a , to provide alignment with previous windows; (e) K max is the maximum window shift, 
i.e., 0 ^ k m ^ K max for all m; (f) W Q v = W - S s is the fixed number of overlapping points between windows; 

(g) head__buf is a storage buffer for samples from an input signal buffer, head buf has a length of K max + 

W; and (h) tail buf is a storage buffer of length W ov . 

35 As shown at box 500 of FIG. 5, the program performs an initialization step and sets ko = 0 and m = 0. 

Then, control is shifted to box 510. In the initialization step, the program processes the first W samples in 
the input signal by copying S s samples, i.e., samples 0 to S s -1, from the input signal buffer to an output 

signal buffer and by copying W G v samples, i.e., samples S s to W - 1 from the input buffer to tail buf. 

At box 510 of FIG. 5, the program increments m by 1. Then, control is transferred to box 520. 
40 At box 520 of FIG. 5, the program sets the variable pred equal to k m ^ + S s - S a . Then, control is 
transferred to decision box 530. 

At decision box 530 of FIG. 5, the program determines whether 0 ^ pred £ K^. If so, control is 
transferred to box 550, otherwise, control is transferred to box 540. 

At box 540 of FIG. 5, the program computes k m in accordance with a flowchart which is shown in FIG. 6 
45 and which is described in detail below. Then, control is transferred to box 560. 

At box 550 of FIG. 5, the programs sets k m = pred. Then, control is transferred to box 570. 

At box 560 of FIG. 5, the program updates the first Wov samples of head buf starting at offset k m by 

performing an over-lap add using a weighting function in accordance with the flowchart show in FIG. 7. 
Then, control is transferred to box 570. 

so At box 570 of FIG. 5, the program copies S s samples, starting at offset k m , from head buf to the output 

buffer. Then, control is transferred to box 580. 

At box 580 of FIG. 5, the program copies p samples from head buf to tail buf, starting at offset k m + 

S s in head buf. Then, control is transferred to decision box 590. 

At decision box 590 of FIG. 5, the program determines whether the end of the signal has been reached. 
55 If so, control is transferred to box 595 to output the signal by converting it into an analog form or for further 
processing, otherwise, control is transferred to box 597. 

At box 597 of FIG. 5, the program copies Kmax + w samples from the input buffer, starting at sample 
m*S a . to head buf. Then, control is transferred to box 510. 
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FIG. 6 shows a flowchart of a procedure for computing k m . At box 600 of FIG. 6, the program Initializes 
variables by setting shift = 0; Rxxmax = 0; and best_shift = 0. Then, control is transferred to box 610. 

At box 610 of FIG. 6, the program initializes loop variables R™, i, numer, and denom by setting = 0, 
i = 0, numer = 0, and denom = 0. Then, control is transferred to box 620. 

5 At box 620 of FIG. 6, the program adds the following amount to numer: tail buf[iTTiead buf[i] and 

adds the following amount to denom: 

head buf[i + shift]*head buf[i + shift]. Then, control is transferred to decision box 630. 

At decision box 630 of FIG. 6, the program determines whether i < W 0 v- If so, control is transferred to 
box 635, otherwise, control is transferred to box 640. 
10 At box 635 of FIG. 6, the program increments i by 1. Then, control is transferred to box 620. 

At box 640, the program sets R** = numer*: numer! /denom. Then, control is transferred to decision box 

645. 

At decision box 645, the program determines whether R^ is greater than Rxxmax- If so, control is 
transferred to box 650, otherwise, control is transferred to decision box 660. 
/5 At box 650 of FIG. 6, the program replaces the old value of Rxxmax with the value of Rxx and replaces 

the old value of best shift with shift. Then, control is transferred to decision box 660. 

At decision box 660 of FIG. 6, the program determines whether shift is less than K max . If so, control is 
transferred to box 665, otherwise, control is transferred to box 670. 

At box 665 of FIG. 6, the program increments shift by 1. Then, control is transferred to box 610. 

20 At box 670 of FIG. 6, k m is set equal to best shift. Then, control is transferred to box 680 to return. 

FIG. 7 shows a flowchart of a procedure for updating the first W G v points of head buf using a weighting 

function to perform overlap adding. At box 700 of FIG. 7, the program initializes loop variable i by setting i 
= 0. Then, control is transferred to box 710. 

At box 710 of FIG. 7, the program performs an overlap-add by computing head buf[k m + i] = f(i) 

25 head buf[k m + i] + (1 - f(i))tail buffi]; where f(i) is a weighting function and 0 ^ f(i) ^ 1 for all i. Then, 

control is transferred to decision box 720. 

At decision box 720 of FIG. 7, the program determines whether i is less than W 0 v- If so, control is 
transferred to box 730, otherwise, control is transferred to box 740 to return. 

At box 730 of FIG. 7, the program increments i by 1. Then, control is transferred to box 710. 
30 Large shifts S s , S at and windows W cause problems in time-scale modification because the signal data 

may change character radically between windows. Note that : (S s - S a )i determines the minimum number of 
samples inserted or deleted when the shift predicted lies outside the range [0 , K^^]. This is why small 
analysis shifts are beneficial in SOLAFS. In SOLAFS, although the number of windows increases with 
decreasing analysis shift, S a , the number of predictable shifts increases since the quantity (S s - S a ) in eqn. 
35 (13) decreases. Thus, the benefits of using small analysis shifts can be obtained without large increases in 
computation. 

The window size, synthesis shift, and length of the overlap region are all interrelated. The amount of 
computation required to determine unpredictable shift values is on the order of JK^WZovi multiply/adds, 
and thus efficient parameter combinations will use as small a value of W Q v as possible. The number of 

40 overlap points W ov must not be too small, however, or else the variance of the similarity computation will be 
too large and transitions between segments will be audible. For voicemail applications with 8 kHz sampling, 
W ov = 30 samples appears to be sufficient and results in smooth transitions. 

To determine an appropriate window size, note that W = S s + W ov . If one wishes to have at most two 
windows overlap at any point in the output, one requires that S s £ W ov . In this case, the smallest useful 

45 synthesis shift is S s = W ov , and the smallest useful window length is W = 2W ov - It is also possible to 
choose the synthesis shift to be less than the overlap region, S s < W ov . in which case more than two 
windows will overlap in certain regions. This allows a somewhat smoother transition between windows, but it 
increases the computation and the shifts predicted by eqn. (13) are no longer guaranteed to maximize the 
similarity in the overlap region. With S s fixed, the analysis shift, S a , is chosen to achieve the desired 

so compression or expansion rate. Note that non-integer values of S a are acceptable, since S a is only used to 
compute the range of starting positions of the windows at each iteration. 

The maximum shift K^x is an important parameter. This must be chosen to be larger than the largest 
expected pitch period in the input signal to avoid pitch fracturing. In a voicemail application with male 
speakers and 8 kHz sampling, a preferred choice is = 100 samples. This choice allows synchroniza- 

55 tion of periods down to 80 Hz when time-scale modifying music as well. 

It is not necessary to choose S a to be larger than K^. However, if S a < K^*, some care should be 
used to ensure that during analysis each window starts at a time no earlier than the previous window. k m + 
S a £ k m .,. Thus, best results occur if eqn. (13) is modified so that the maximum over R m xy[k] is computed 
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only over the range max(0, k m .i -S a ) ^ K max . 

Evaluations of SOLAFS were performed using speech from male and female speakers which was 
bandlimited to 3.8 kHz and which was sampled at 8 kHz using 16-bit linear quantization. High-quality output 
was obtained over a wide range of window lengths, analysis shifts, and synthesis shifts. In all cases, 
5 choosing K max to be less than the duration of the largest pitch period in the signal drastically degrades 
output signal quality. Very slight fluttering was detectable in voiced segments of compressed-by-2 speech 
with W ov = 20 samples. This artifact diminished rapidly with increasing W 0 v and was undetectable at W ov 
= 40 samples. 

The following parameter choices provided high-quality output for time-scale expansion by 2 (a = 0.5): 

w W = 120, S a = 40, S a = 80, and K max = 100 where these parameter values are set forth in number of 8 
kHz samples. High-quality time-scale compressed by 2 speech (a = 2) was obtained with: W = 120, S a = 
160, S a = 80, K maK = 100 for a sampling rate of 8 kHz. Slight improvements in quality may be gained by 
decreasing S a and W, though such improvements are barely audible. 

The amount of time-scale modification performed, quality, or computational efficiency of the method 

15 can be altered during processing of a particular signal by changing the parameter values W, S s> or S a . 
Recall that a = S s /S a , so that a decrease or increase in S a will cause an increase or decrease in a, 
respectively. It may also be desirable to change W or S s , in which case, the quantity W ov = W - S 5 may 
change, but operation of the method will otherwise remain the same. 

Those of ordinary skill in the art will readily appreciate that numerous different types of similarity 

20 measures may be used to determine shift values in carrying out the inventive method. Further, those of 
ordinary skill in the art will readily appreciate that the number of computations required to provide a 
similarity measure would be reduced if the similarity measure did not comprise a denominator normalizing 
factor. Such a similarity measure may be developed when one considers that alignment affects the quality 
most during periodic portions of the speech signal. These portions of the speech signal represent voiced 

25 segments which have periods between 3.75 msec and 12.5 msec (30 and 100 samples at a 8 kHz sampling 
rate). If one assumes that the pitch period is the highest amplitude frequency in these portions, it is valid to 
assume that the shift which results in the highest number of agreeing signs will also align these periods. 
This gives the following similarity measure: 



30 



(17) R^Ck) = Sum {sign[y(ms s - k(m) + j ) ] sign [x (mS a + j ) ] > 

j=o 

35 

This similarity measure weighs all samples equally and it eliminates the need for normalizing the 
similarity measure by signal power. Further, this similarity measure makes full use of the periodic structure 
of those portions of the input speech signal which are most sensitive to alignment. In essence, this converts 
a complicated input speech signal into a square wave of unity amplitude whose zero crossings match those 

40 of the speech signal and, as a result, the number of agreeing signs is identical to a cross-correlation on this 
unity amplitude square wave. The resulting similarity measure is, therefore, a good approximation to the 
more complex cross-correlation and, yet, requires no multiplications. Thus, in determining this similarity 
measure, a key operation performed on the data is an exclusive or (XOR) on the sign bits of the data. Since 
only the sign bits are used, an efficient embodiment involves stripping sign bits from the data and loading 

45 them into a buffer of bit length equal to (W + K max ). A similar buffer holds the sign bits of the last p points 
in the output buffer. The desired shift then corresponds to the bit offset between buffers providing the 
largest number of 0's, i.e., a false for XOR, in the XOR result in the W ov points from the output and input 

(head but) buffers. Digital signal processors are commercially available for performing this type of 

population count of bits on numbers in a single instruction. Note that such an embodiment advantageously 

50 permits operation on blocks of the input data rather than on single samples. For example, 8 samples for 
byte operation, 16 samples for word operations, and so forth. Alternatively, the input signal can be pre- 
processed to +1 or -1 for all samples. A single bit multiply-accumulate would correspond to the number of 
agreeing signs; and assuming less than 256 overlapping points, only 8 bits plus a sign bit would be 
required for the accumulation sum. 

55 We have determined that alignment is most critical during voiced portions of speech signals. The nature 

of the signal in these portions, i.e., large amplitude fundamental periods, make it possible to reduce 
computations by evaluating the similarity measure for shifts using decimated data and by evaluating the 
similarity measure for shifts using reduced shift resolution such as, for example, by evaluating the similarity 
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measure for every other shift. It is also possible to overlap-add/linearly fade over more data points than are 
used in the similarity measure calculation. This allows smoother transitions without an increase in 
computation, but restricts the similarity measure determination to a fraction of the total segments to be 
overlap-added. 

5 The ability to perform high quality compression and expansion provides means for a time-based voice 

compression system. When time-scale compression is followed by expansion, without error, combining the 
two techniques reduces the data required for coding and storing speech signals. This method of compres- 
sion may be combined with other compression techniques to further reduce the bit rate. Time-scale 
compressed speech may also be encoded using alternative techniques which are well known to those of 

w ordinary skill in the art such as, for example, vector quantization, quadrature mirror filtering, and pulse code 
modulation. After decoding, the time-scale compressed signal is expanded by an appropriate factor to 
obtain speech with the original time-scale. 

Although the inventive SOLAFS method has been described with reference to the application thereof to 
samples of a signal for ease of understanding, it should be noted that the inventive method is not limited to 

75 operating on samples of the signal. In particular, the method operates by searching for similar regions in an 
input and an output and then overlapping the regions to produce a time-scale modified output. The method 
can also be applied to numerous signal representations other than samples. For example, it is possible to 
use the inventive method by searching for similar regions in signal representations of an input and an output 
stream of signal representations using an appropriate similarity measure and then overlapping the regions 

20 by combining the signal representations to produce a time-scale modified output stream of signal 
representations. As one particular example, for use in sub-band coding, the data necessary to represent a 
portion of a signal is reduced by encoding information about the energy in specific frequency bands. In 
using the inventive SOLAFS method on the sub-band coded representation of the signal, similar sub-band 
characteristics would be merged to form an output stream of signal representations of the time-scale 

25 modified signal. Employing the method reduces the overhead associated with converting the input stream of 
encoded signal representations to an input stream of samples before processing. 

Claims 

30 1. A method for time-scale modification of a signal comprised of an input stream of signal representations 
to form an output stream of signal representations, the method comprising the steps of: 

determining an input block of W signal representations from the input stream for use in overlapping 
signal representations from the input block with signal representations in the output stream; and 

overlapping W D v signal representations from the beginning of the input block with W 0 v signal 
35 representations from the end of the output stream, where W Q v is determined by W and the time-scale 

modification. 

2. The method of claim 1 wherein the step of overlapping comprises the step of: 

applying a weighting function to W ov signal representations from the beginning of the input block 
40 and to W ov signal representations from the end of the output stream to determine values of W ov signal 

representations to be substituted for the W 0 v signal representations at the end of the output stream; 
and wherein the step of overlapping further comprises the step of: 

placing W - W ov = S s signal representations from the input stream at the end of the output stream, 
the S s signal representations being subsequent to the W G v signal representations from the beginning of 
45 the input block. 

3. The method of claim 2 wherein: 

the step of determining an input block comprises the steps of: 

determining an initial input block of W + signal representations from the input stream, where 

so K max isa predetermined amount; 

determining a maximum of a similarity measure between W ov signal representations from the initial 
input block and W ov signal representations from the end of the output stream over a fixed search range 
of Km** signal representations, the search starting at the beginning of the initial input block; and 

determining the input block to comprise W signal representations which begin at the sample in the 
55 initial input block whose W ov signal representations provided a maximum of the similarity measure. 

4. The method of claim 3 wherein the step of determining an initial input block comprises the step of: 

determining the first signal representation of the m th initial input block as being the signal 
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representation which occurs rnS a signal representations after the first sample in the input stream, where 

S a is a predetermined amount; 

and wherein the step of determining a maximum of the similarity measure comprises the steps of: 
determining a similarity measure for the W ov signal representations starting at the beginning of the 
5 initial input block and the W Q v signal representations at the end of the output stream; 

shifting the beginning of the initial input block and repeating the previous step over the fixed search 

range; and 

determining the maximum similarity measure. 
10 5. The method of claim 4 wherein the similarity measure is a cross-correlation. 

6. The method of claim 5 wherein the weighting function is a average. 

7. The method of claim 3 wherein the step of determining a maximum of a similarity measure comprises 
75 the steps of: 

determining a single-bit, square-wave, correlation function. 

8. The method of claim 7 wherein the step of determining a single-bit, square-wave, correlation function 
comprises the step of determining a logical exclusive OR of sign-bits of the signal signal representa- 

20 tions. 

9. The method of claim 5 wherein the weighting function provides a linear fade. 

10. A method for time-scale modification of a signal comprised of an input stream of signal representations 
to form an output stream of signal representations, the method comprising the steps of: 

determining a number of signal representations for use in overlapping signal representations from 
the input stream to the output stream, W D v; 

determining an input block of W signal representations from the input stream for use in overlapping 
signal representations from the input block with signal representations in the output stream; and 
30 overlapping W Q v signal representations from the beginning of the input block with W ov signal 

representations from the end of the output stream. 

11. The method of claim 10 wherein the step of overlapping comprises the step of: 

applying a weighting function to W D v signal representations from the beginning of the input block 
35 and to Wqv signal representations from the end of the output stream to determine values of W ov signal 

representations to be substituted for the W Q v signal representations at the end of the output stream; 
and wherein the step of overlapping further comprises the step of: 

placing W - W Q v = S s signal representations from the input stream at the end of the output stream, 
the S s signal representations being subsequent to the W ov signal representations from the beginning of 
40 the input block. 

a 

12. The method of claim 11 wherein: 

the step of determining an input block comprises the steps of: 

determining an initial input block of W + K max signal representations from the input stream, where 
45 K max isa predetermined amount; 

determining a maximum of a similarity measure between W D v signal representations from the initial 
input block and W D v signal representations from the end of the output stream over a fixed search range 
of Kma,, signal representations, the search starting at the beginning of the initial input block; and 

determining the input block to comprise W signal representations which begin at the sample in the 
so initial input block whose W ov signal representations provided a maximum of the similarity measure. 

13. The method of claim 12 wherein the step of determining an initial input block comprises the step of: 

determining the first sample of the m th initial input block as being the sample which occurs mS a 
signal representations after the first sample in the input stream, where S a is a predetermined amount; 
55 and wherein the step ol determining a maximum of the similarity measure comprises the steps of: 

determining a similarity measure for the W ov signal representations starting at the beginning of the 
initial input block and the W G v signal representations at the end of the output stream; 

shifting the beginning of the initial input block and repeating the previous step over the fixed search 
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range; and 

determining the maximum similarity measure. 

14. The method of claim 13 wherein the similarity measure is a cross-correlation. 

5 

15. The method of claim 14 wherein the weighting function is a average. 

16. The method of claim 12 wherein the step of determining a maximum of a similarity measure comprises 
the steps of: 

70 determining a single-bit, square-wave, correlation function. 

17. The method of claim 16 wherein the step of determining a single-bit, square-wave, correlation function 
comprises the step of determining a logical exclusive OR of sign-bits of the signal signal representa- 
tions. 



is 



18. The method of claim 14 wherein the weighting function provides a linear fade. 



19. A method which comprises the steps of: 

time-scale modifying a signal comprised of an input stream of signal representations to form an 
20 output stream of signal representations wherein at least one of the steps of time-scale modifying 

comprises: 

determining an input block of signal representations from the input stream for use in appending 
signal representations from the input block to signal representations in the output stream, where the 
number appended is determined by the time-scale modification; and 
25 appending the signal representations to the end of the output stream. 

20. The method of claim 1 wherein the method comprises the further step of overlapping signal representa- 
tions which are more than W D v signal representations from the beginning of the input block. 
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® Method for time-scale modification ("TSM") of a 
signal, for example, a voice signal, wherein starting 
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similarity measure to evaluate such similarity and the 
cross-correlation uses a fixed, predetermined mini- 
mum number of samples. The starting position of 
the analysis window which results in the greatest 
similarity in overlapping regions is determined as the 



starting position which provides the largest value of 
cross-correlation in the overlapping regions. Several 
cross-correlations are evaluated by shirting the ana- 
lysis window over a predetermined number of sam- 
ples, removing the first shifted samples in the evalu- 
ation each time, and using the same, predetermined 
number of samples in the evaluation to determine 
the "best" starting position for an analysis window. 
Finally, the predetermined number of samples from 
the beginning of the analysis window are averaged 
with the predetermined number of samples from the 
end of the previous portion of the output signal and 
the remaining samples in the window are appended 
to the averaged segment of the previous portion of 
the output signal. 
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